Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers, and 1,000 random numbers.
Explain the differences among these figures. Do they all indicate that the data are white noise?
As sample size increases (from 36 random number to 360 random numbers and then 1,000 random numbers), the correlation tends to 0.The ACF bands keeps on getting narrower and random numbers size increases. If the number of spikes are more than 5% that are outside the bounds than series is not white noise. In each of these ACF plots, that is not the case as all the bars are close to 0 so all are of white noise.
Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
The critical values are at different distances from the mean of zero since critical values for white noise are supposed to lie within \(\pm \frac{1.96}{\sqrt{T}}\) where T is length of time series. In this case, as T gets bigger, range gets smaller. Hence the autocorrelations different in each figure.
A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
head(ibmclose)## Time Series:
## Start = 1
## End = 6
## Frequency = 1
## [1] 460 457 452 459 462 459
ggtsdisplay(ibmclose, main="Daily closing IBM stock price", ylab="Stock Price", xlab="Days")This time series does show the trend e.g. from 210 to 270 it shows the downward trend. the ACF plot is useful for identifying non-stationary time series. For a non stationary time series, the ACF plot decreases slowly. Here the ACF plot does show a slow decrease as the lag increases.The PACF plot is a plot of the partial correlation coefficients between the series and lags of itself. Here PACF plot shows the first lag is close to 1 and all the other PACF is close to 0. Thus we can conclude it is a non-stationary time series and should be differenced to make it stationary.
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
usnetelec
#Annual US net electricity generation (billion kwh) for 1949-2003
head(usnetelec)## Time Series:
## Start = 1949
## End = 1954
## Frequency = 1
## [1] 296.1 334.1 375.3 403.8 447.0 476.3
ggtsdisplay(usnetelec,
main="Annual US net electricity generation",
ylab="billion kwh",
xlab="year")The graph and ACF plot show an upward trend for this time series. PACF shows all the lags close to 0 except the 1st one which is close to 1. It confirms this a non stationary time series.
Now lets do the BoxCox transformation and see the results.
bc_trans <- BoxCox(usnetelec, BoxCox.lambda(usnetelec))
ggtsdisplay(bc_trans,
main=paste("Annual US net electricity generation - BoxCox lambda=", round(BoxCox.lambda(usnetelec), 3)),
ylab="billion kwh",
xlab="year")After BoxCox transformation of given time series, we dont see noticeable change here which could be due to non seasonalilty in time series. Next is to use kpss test in which the null hypothesis is that the data are stationary, and we look for evidence that the null hypothesis is false. Consequently, small p-values (e.g., less than 0.05) suggest that differencing is required.
bc_trans %>% ur.kpss() %>% summary()##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 1.4583
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The test statistic is much bigger than the 1% critical value, indicating that the null hypothesis is rejected and the boxcox transformed data is not stationary. We will now use ndiffs() function to determine the order of differencing.
ndiffs(bc_trans)## [1] 2
It shows number of differences required is 2 for boxcox transformed data. Lets first apply the differences as 1 and see the results.
bct.diff <- bc_trans %>% diff()
bct.diff %>% ur.kpss() %>% summary()##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 0.4315
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
We can see the diff of order 1 makes the test statistic small and well within the range we would expect for stationary data. So we can conclude that the differenced data are stationary.
ggtsdisplay(bct.diff,
main="Annual US net electricity generation - BoxCox Diff",
ylab="billion kwh",
xlab="year")usgdp
# Quarterly US GDP. 1947:1 - 2006.1.
head(usgdp)## Qtr1 Qtr2 Qtr3 Qtr4
## 1947 1570.5 1568.7 1568.0 1590.9
## 1948 1616.1 1644.6
ggtsdisplay(usgdp, main="Quarterly US GDP",xlab="Year",ylab="US Dollars")The graph and ACF plot show an upward trend for this time series. PACF shows all the lags close to 0 except the 1st one which is close to 1. It confirms this a non stationary time series.
Now lets do the BoxCox transformation and see the results.
usgdp.bc_trans <- BoxCox(usgdp, BoxCox.lambda(usgdp))
ggtsdisplay(usgdp.bc_trans,
main=paste("Quarterly US GDP - BoxCox lambda=", round(BoxCox.lambda(usgdp), 3)),
xlab="Year",
ylab="US Dollars")It is evident here that Box-Cox transformation, with lambda 0.366, has removed the curvature in the original data. Next is to use kpss test in which the null hypothesis is that the data are stationary,
usgdp.bc_trans %>% ur.kpss() %>% summary()##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 4.8114
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The test statistic is much bigger than the 1% critical value, indicating that the null hypothesis is rejected and the boxcox transformed data is not stationary. We will now use ndiffs() function to determine the order of differencing.
ndiffs(usgdp.bc_trans)## [1] 1
It shows number of differences required is 1 for boxcox transformed data.
usgdp.bct.diff <- usgdp.bc_trans %>% diff()
usgdp.bct.diff %>% ur.kpss() %>% summary()##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.2013
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
We can see the diff of order 1 makes the test statistic small and well within the range we would expect for stationary data. So we can conclude that the differenced data are stationary.
ggtsdisplay(usgdp.bct.diff,
main="Quarterly US GDP - BoxCox Diff",
xlab="Year",
ylab="US Dollars")mcopper
head(mcopper)## Jan Feb Mar Apr May Jun
## 1960 255.2 259.7 249.3 258.0 244.3 246.8
# Monthly copper prices
ggtsdisplay(mcopper, main="Monthly copper prices", ylab="pounds per ton", xlab="Year")The time series shows a slight seasonality and periods of both downward and upward trend. We can also see a sudden spike in 2000s. These facts confirm that it is a non-stationary time series. Now lets do the BoxCox transformation and see the results.
mcop.bc_trans <- BoxCox(mcopper, BoxCox.lambda(mcopper))
ggtsdisplay(mcop.bc_trans,
main=paste("Monthly copper prices - BoxCox lambda=", round(BoxCox.lambda(mcopper), 3)),
ylab="pounds per ton",
xlab="Year")It is evident here that Box-Cox transformation, with lambda 0.192, shows the seasonality. Next is to use kpss test in which the null hypothesis is that the data are stationary.
mcop.bc_trans %>% ur.kpss() %>% summary()##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 6 lags.
##
## Value of test-statistic is: 6.2659
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The test statistic is much bigger than the 1% critical value, indicating that the null hypothesis is rejected and the boxcox transformed data is not stationary. We will now use ndiffs() function to determine the order of differencing.
ndiffs(mcop.bc_trans)## [1] 1
It shows number of differences required is 1 for boxcox transformed data.
mcop.bct.diff <- mcop.bc_trans %>% diff()
mcop.bct.diff %>% ur.kpss() %>% summary()##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 6 lags.
##
## Value of test-statistic is: 0.0573
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
We can see the diff of order 1 makes the test statistic small and well within the range we would expect for stationary data. So we can conclude that the differenced data are stationary.
ggtsdisplay(mcop.bct.diff,
main="Monthly copper prices - BoxCox Diff",
ylab="pounds per ton",
xlab="Year")enplanements
# Monthly US domestic enplanements
head(enplanements)## Jan Feb Mar Apr May Jun
## 1979 21.12 22.92 25.90 24.38 23.41 26.82
ggtsdisplay(enplanements, main="US Domestic Revenue Enplanements", ylab="millions", xlab="Year")This time series has upward trend and seasonality. Also we see a sudden drop in year 2002. Thus this series is non stationary. Now lets do the BoxCox transformation and see the results.
enpl.bc_trans <- BoxCox(enplanements, BoxCox.lambda(enplanements))
ggtsdisplay(enpl.bc_trans,
main=paste("US Domestic Revenue Enplanements - BoxCox lambda=", round(BoxCox.lambda(enplanements), 3)),
ylab="millions",
xlab="Year")It is evident here that Box-Cox transformation, with lambda -0.227, shows the seasonality. Next is to use kpss test in which the null hypothesis is that the data are stationary.
enpl.bc_trans %>% ur.kpss() %>% summary()##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 4.3785
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The test statistic is much bigger than the 1% critical value, indicating that the null hypothesis is rejected and the boxcox transformed data is not stationary. We will now use ndiffs() function to determine the order of differencing.
ndiffs(enpl.bc_trans)## [1] 1
It shows number of differences required is 1 for boxcox transformed data.
enpl.bct.diff <- enpl.bc_trans %>% diff()
enpl.bct.diff %>% ur.kpss() %>% summary()##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0151
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
We can see the diff of order 1 makes the test statistic small and well within the range we would expect for stationary data. So we can conclude that the differenced data are stationary.
ggtsdisplay(enpl.bct.diff,
main="US Domestic Revenue Enplanements - BoxCox Diff",
ylab="millions",
xlab="Year")visitors
# Monthly Australian short-term overseas vistors. May 1985-April 2005
head(visitors)## May Jun Jul Aug Sep Oct
## 1985 75.7 75.4 83.1 82.9 77.3 105.7
ggtsdisplay(visitors,
main="Overseas visitors to Australia",
ylab="Thousands of people",
xlab="Year")This time series has upward trend and seasonality. So this series is non stationary. Now lets do the BoxCox transformation and see the results.
visi.bc_trans <- BoxCox(visitors, BoxCox.lambda(visitors))
ggtsdisplay(visi.bc_trans,
main=paste("Overseas visitors to Australia - BoxCox lambda=", round(BoxCox.lambda(visitors), 3)),
ylab="Thousands of people",
xlab="Year")It is evident here that Box-Cox transformation, with lambda 0.278 does help for variation. Next is to use kpss test in which the null hypothesis is that the data are stationary.
visi.bc_trans %>% ur.kpss() %>% summary()##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 4.5233
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The test statistic is much bigger than the 1% critical value, indicating that the null hypothesis is rejected and the boxcox transformed data is not stationary. We will now use ndiffs() function to determine the order of differencing.
ndiffs(visi.bc_trans)## [1] 1
It shows number of differences required is 1 for boxcox transformed data.
visi.bct.diff <- visi.bc_trans %>% diff()
visi.bct.diff %>% ur.kpss() %>% summary()##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.0519
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
We can see the diff of order 1 makes the test statistic small and well within the range we would expect for stationary data. So we can conclude that the differenced data are stationary.
ggtsdisplay(visi.bct.diff,
main="Overseas visitors to Australia - BoxCox Diff",
ylab="Thousands of people",
xlab="Year")For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349627V"], frequency=12, start=c(1982,4))
ggtsdisplay(myts,
main="Retail Sales",
ylab="Sales",
xlab="Year")There is a clear annual seasonality increase in retail sales from October to December. I see a consistent upward trend and dont see cyclicity. Now lets do the BoxCox transformation and see the results.
sale.bc_trans <- BoxCox(myts, BoxCox.lambda(myts))
ggtsdisplay(visi.bc_trans,
main=paste("Retail Sales - BoxCox lambda=", round(BoxCox.lambda(myts), 3)),
ylab="Sales",
xlab="Year")t is evident here that Box-Cox transformation, with lambda -0.058 does help for seasonal variation. Next is to use kpss test in which the null hypothesis is that the data are stationary.
sale.bc_trans %>% ur.kpss() %>% summary()##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 6.2172
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The test statistic is much bigger than the 1% critical value, indicating that the null hypothesis is rejected and the boxcox transformed data is not stationary. We will now use ndiffs() function to determine the order of differencing.
ndiffs(sale.bc_trans)## [1] 1
It shows number of differences required is 1 for boxcox transformed data.
sale.bct.diff <- sale.bc_trans %>% diff()
sale.bct.diff %>% ur.kpss() %>% summary()##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0175
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
We can see the diff of order 1 makes the test statistic small and well within the range we would expect for stationary data. So we can conclude that the differenced data are stationary.
ggtsdisplay(sale.bct.diff,
main="Retail Sales - BoxCox Diff",
ylab="Sales",
xlab="Year")Use R to simulate and plot some data from simple ARIMA models.
Use the following R code to generate data from an AR(1) model with \(\phi_1 = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1 = 0\).
Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?
Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2=1\).
Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?
Generate data from an ARMA(1,1) model with \(\phi_1=0.6\), \(\theta_1=0.6\) and \(\sigma^2=1\).
Generate data from an AR(2) model with \(\phi_1=-0.8\), \(\phi_2=0.3\) and \(\sigma^2=1\). (Note that these parameters will give a non-stationary series.)
Graph the latter two series and compare them.
Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.
By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.
Should you include a constant in the model? Explain.
Fit the model using R and examine the residuals. Is the model satisfactory?
Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.
Create a plot of the series with forecasts and prediction intervals for the next three periods shown.
Does auto.arima() give the same model you have chosen? If not, which model do you think is better?